FleaBay: eCommerce OSINT & Network Analysis
September 1, 2019
Tyler Seymour
tylerseymour@protonmail.com
https://tylerseymour.pw
FleaBay is an OSINT workflow for eCommerce network analysis. It maps buyers to sellers in online marketplaces by analyzing publicly available feedback and reviews. This workflow constructs a network graph relating to one or more usernames, allowing investigators to track several types of fraud and criminal activity, including money laundering, fake reviews, suspension circumvention, and cash-out routes. Specifically, FleaBay allows investigators to:
FleaBay collects reviews by a single user and finds all of the usernames that that are "1-hop" from the target. A hop refers to the distance between the target (reviewer) and the destination (buyer or seller) in a network. As used here, 1-hop usernames refers to users that were directly reviewed by the target.
The tool expands the network to "2-hops" by collecting reviews for each unique 1-hop username. In this case, the path from the target user to the destination is Target --> Intermediary (1-hop) --> Destination (2-hop).
FleaBay constructs a network graph comprising username (nodes) and transaction (edges). This graph represents the eCommerce network.
FleaBay finds potential paths for money to flow between two users, despite intermediary or proxy accounts. This path is calculated by finding the shortest path between two nodes in a network using Dijkstra's algorithm.
FleaBay only supports eBay transactions at the moment. Plans to add Amazon and other eCommerce networks.
#################
# Basic Imports #
#################
from __future__ import print_function, division
import pandas as pd
import networkx as nx
import numpy as np
import requests
import os
import os.path
from os import path
import zipfile
import glob
###########################
# Visualization Libraries #
###########################
from pyvis import network as net
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
#################
# Hide Warnings #
#################
import warnings
warnings.simplefilter('ignore')
##########
# Output #
##########
print()
print("Imports Complete. ")
This function allows you to import a previously saved DataFrame and pickup where you left off. To import an existing DataFrame, type the filename in the input box. For example, try "janedoe_1hop" or "janedoe_2hop" (without quotes) to load some previously collected data. To create a new DataFrame and move onto the next step, press "Enter" without inputting any data.
print()
while True:
try:
pickle = input("Select a DataFrame, or Enter to Continue: \n\n")
if pickle == '':
print("No DataFrame Selected. Move on create a new network graph.")
break
else:
pickleName = ("./flea-exports/" + pickle + ".pkl")
df = nx.read_gpickle(pickleName)
print()
print("DataFrame Import Complete.")
print(df.shape)
display(df.head())
break
except FileNotFoundError:
print()
print("Error. Try a different file.")
Before starting the scrape, the first step is to calculate the total number of reviews for the starting "seed" user. The number of reviews can vary from no reviews to millions of reviews per username. FleaBay uses the number of reviews to determine the total pages of reviews, at 200 reviews per page, that need to be scraped.
#####################
# Set Seed Username #
#####################
print()
username = input("Enter a single username to investigate: ")
print("Username to collect: " + username)
print()
###################
# User Input Loop #
###################
while True:
try:
########################
# Calculate Statistics #
########################
html = ('https://feedback.ebay.co.uk/ws/eBayISAPI.dll?ViewFeedback2&ftab=FeedbackLeftForOthers&userid=' + username + '&iid=-1&de=off&items=200&searchInterval=30&mPg=2&page=1')
tempdf = pd.read_html(html, header=0)
tempdf = tempdf[14].copy(deep=False) #Frame 14 contains the number of reviews statistics
fcount = tempdf.columns.get_values()[0]
flist = fcount.split(' ')
flist[0] = flist[0].replace(',', '')
numReviews = int(flist[0])
pages = (numReviews // 200) + 2
num = pages-1
######################
# Output Information #
######################
print()
print("The username " + username + " has " + str(numReviews) + " reviews.")
print("At 200 reviews per page, there are " + str(num) + " pages that need to be scraped.")
print("For more information, view the " + username + " feedback page at: ")
print(html)
break
###################################
# Handle Username Not Found Error #
###################################
except IndexError:
print()
username = input("Error. Try a different username: ")
##################################
# Scrape Seed Username's Reviews #
##################################
df = pd.DataFrame(columns=['Unnamed: 0', 'Feedback', 'Left for', 'When', 'Unnamed: 4', 'origin'])
for pageNumber in range(1, pages):
html = ('https://feedback.ebay.co.uk/ws/eBayISAPI.dll?ViewFeedback2&ftab=FeedbackLeftForOthers&userid=' + username + '&iid=-1&de=off&items=200&searchInterval=30&mPg=2&page=' + str(pageNumber))
dftemp = pd.read_html(html, header=0)
dftemp = dftemp[15].copy(deep=False) #Frame 15 contains the reviews
dftemp['origin'] = username
frames = [df, dftemp]
df = pd.concat(frames)
print("Scraping page No. " + str(pageNumber))
######################
# Output Information #
######################
print(df.shape)
display(df.head())
df = df.drop(['Unnamed: 0'], axis=1)
df = df.drop(['Unnamed: 4'], axis=1)
df = df.dropna(inplace=False)
df = df[~df["Feedback"].str.contains('--')]
df = df[~df["Feedback"].str.contains('infoDetailed')]
df = df[~df["Feedback"].str.contains('Reply by')]
df.columns = ['feedback', 'type', 'when', 'origin']
df = df[~df["type"].str.contains('Buyer:')]
df = df[~df["type"].str.contains('--')]
df.reset_index(inplace=True, drop=True)
df[['type', 'username']] = df['type'].str.split(': Member ID ', expand=True)
df = df.dropna()
df['username'] = df['username'].apply(lambda x: x.split(' ')[0])
df.reset_index(inplace=True, drop=True)
print()
print("The shape of the dataframe after the first pass of cleaning is: ")
print(df.shape)
display(df.head())
plt.figure(figsize=(12, 12))
g = nx.from_pandas_edgelist(df, source='origin', target='username')
nx.draw(g, with_labels=True)
print()
imageInput = input("Save PNG Image As: ")
imageName = ("./flea-exports/" + imageInput + ".png")
print("Exported image to " + imageName)
plt.savefig(imageName, transparent=True, dpi=300)
pickle = input("Save Dataframe as: ")
pickleName = ("./flea-exports/" + pickle + ".pkl")
df.to_pickle(pickleName)
print()
print("Exported as " + pickleName)
while True:
try:
pickle = input("Import Dataframe (Enter to Exit) ")
if pickle == "":
break
else:
pickleName = ("./flea-exports/" + pickle + ".pkl")
df = nx.read_gpickle(pickleName)
print("Import complete.")
print(df.shape)
display(df.head())
break
except FileNotFoundError:
print()
print("Error. Try a different file.")
usernameList = list(df['username'].unique())
originList = list(df['origin'].unique())
deletedUsers = list([s for s in usernameList if "deleted" in s])
usernameList = [x for x in usernameList if x not in deletedUsers]
print()
print("This dataset includes " + str(len(usernameList)) + " unique accounts, excluding deleted user accounts.")
print()
for item in originList:
if item not in usernameList:
usernameList.append(originList[0])
else:
pass
print(usernameList)
print
print("Note that the " + str(len(deletedUsers)) + " deleted user accounts that " + username + " has reviewed may be probative of maligned activity.")
print()
print(deletedUsers)
print()
while True:
userInput = input("Add Another Username (Enter to Exit) ")
if userInput == "":
break
else:
usernameList.append(userInput)
print("Added username: " + userInput)
deletedUsers = list([s for s in usernameList if "deleted" in s])
usernameList = [x for x in usernameList if x not in deletedUsers]
print()
print("The following usernames are on the collection list: ")
print(usernameList)
userInputPages = input("Maximum Pages Deep: ")
df = pd.DataFrame(columns=['Unnamed: 0', 'Feedback',
'Left for', 'When', 'Unnamed: 4', 'origin'])
for username in usernameList:
try:
pageNumber = 1
html = ('https://feedback.ebay.co.uk/ws/eBayISAPI.dll?ViewFeedback2&ftab=FeedbackLeftForOthers&userid=' +
username + '&iid=-1&de=off&items=200&searchInterval=30&mPg=2&page=' + str(pageNumber))
tempdf = pd.read_html(html, header=0)
tempdf = tempdf[14].copy(deep=False)
fcount = tempdf.columns.get_values()[0]
flist = fcount.split(' ')
flist[0] = flist[0].replace(',', '')
numReviews = int(flist[0])
pages = (numReviews // 200) + 2
num = pages-1
print()
print("The username " + username + " has " +
str(numReviews) + " reviews.")
print("At 200 reviews per page, there are " +
str(num) + " pages that need to be scraped.")
print("For more information, view the " +
username + " feedback page at: ")
print(html)
except (ValueError, IndexError):
continue
for page in range(1, pages):
if page > int(userInputPages):
pass
else:
html = ('https://feedback.ebay.co.uk/ws/eBayISAPI.dll?ViewFeedback2&ftab=FeedbackLeftForOthers&userid=' +
username + '&iid=-1&de=off&items=200&searchInterval=30&mPg=2&page=' + str(page))
dftemp = pd.read_html(html, header=0)
dftemp = dftemp[15].copy(deep=False)
dftemp['origin'] = username
frames = [df, dftemp]
df = pd.concat(frames)
print("\tScraping page No. " + str(page) + " of " + str(userInputPages) + " | " + str(num) + " total pages.")
print(df.shape)
display(df.head())
#######################
# Clean the DataFrame #
#######################
df = df.drop(['Unnamed: 0'], axis=1)
df = df.drop(['Unnamed: 4'], axis=1)
df = df.dropna(inplace=False)
df = df[~df["Feedback"].str.contains('--')]
df = df[~df["Feedback"].str.contains('infoDetailed')]
df = df[~df["Feedback"].str.contains('Reply by')]
df.columns = ['feedback', 'type', 'when', 'origin']
df = df[~df["type"].str.contains('Buyer:')]
df = df[~df["type"].str.contains('--')]
df.reset_index(inplace=True, drop=True)
df[['type', 'username']] = df['type'].str.split(': Member ID ', expand=True)
df = df.dropna()
df['username'] = df['username'].apply(lambda x: x.split(' ')[0])
df.reset_index(inplace=True, drop=True)
print()
print("The shape of the dataframe after the first pass of cleaning is: ")
print(df.shape)
display(df.head())
pickle = input("Save Dataframe as: ")
pickleName = ("./flea-exports/" + pickle + ".pkl")
df.to_pickle(pickleName)
print()
print("Exported as " + pickleName)
while True:
try:
pickle = input("Import Dataframe (Enter to Exit): ")
if pickle == '':
break
else:
pickleName = ("./flea-exports/" + pickle + ".pkl")
df = nx.read_gpickle(pickleName)
print("Import complete.")
print(df.shape)
display(df.head())
break
except FileNotFoundError:
print()
print("Error. Try a different file.")
print()
print("Warning: Large networks can take several minutes to generate. Be patient. ")
print()
while True:
try:
########################
# Set Layout to Spring #
########################
plt.figure(figsize=(20, 20))
g = nx.from_pandas_edgelist(df, source='origin', target='username')
layout = nx.spring_layout(g, iterations=1000, k=1.25)
################
# Lonely Nodes #
################
sellerUsernames = list(df.username.unique())
lonelyNodes = [review for review in sellerUsernames if g.degree(review) == 0]
g0 = g.subgraph(lonelyNodes).copy()
nx.draw_networkx_nodes(g, layout, node_color='blue', node_size=4)
nx.draw_networkx_edges(g, layout, width=1, edge_color="lightblue")
###################
# Popular Sellers #
###################
popularSellers = [review for review in sellerUsernames if g.degree(review) > 1]
g1 = g.subgraph(popularSellers).copy()
nx.draw_networkx_nodes(g1, layout, nodelist=popularSellers,
node_size=250, node_color='orange')
nx.draw_networkx_edges(g1, layout, edge_color='orange', width=2)
nodeLabelsPopularSellers = dict(zip(popularSellers, popularSellers))
# nx.draw_networkx_labels(g1, layout, labels=nodeLabelsPopularSellers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
#########################
# Originating Usernames #
#########################
sellerOrigin = list(df.origin.unique())
sellerOriginSize = [g.degree(review) * 10 for review in sellerOrigin]
g2 = g.subgraph(sellerOrigin).copy()
nx.draw_networkx_nodes(g2, layout, nodelist=sellerOrigin,
node_size=sellerOriginSize, node_color='green')
nx.draw_networkx_edges(g2, layout, edge_color='lightgreen', width=2)
nodeLabelsOrigin = dict(zip(sellerOrigin, sellerOrigin))
nx.draw_networkx_labels(g2, layout, labels=nodeLabelsOrigin, font_size=14,
font_color='k', font_family='sans-serif', font_weight='normal')
#####################
# Deleted Usernames #
#####################
usernameList = list(df['username'].unique())
deletedUsers = list([s for s in usernameList if "deleted" in s])
g3 = g.subgraph(deletedUsers).copy()
nx.draw_networkx_nodes(g3, layout, nodelist=deletedUsers, node_size=50, node_color='red')
nx.draw_networkx_edges(g3, layout, edge_color='pink', width=2)
nodeLabelsDeletedUsers = dict(zip(deletedUsers, deletedUsers))
# nx.draw_networkx_labels(g3, layout, labels=nodeLabelsDeletedUsers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
########################
# Display Plot Options #
########################
plotTitle = "eBay Seller Network-Spring Layout"
plt.axis('off')
plt.title(plotTitle)
plt.show()
saveLoc = ("./flea-exports/" + plotTitle)
plt.savefig(saveLoc, transparent=True, dpi=300)
break
except NodeNotFound:
pass
# sourceS = input("Error. Try a different username: ")
print()
print("Warning: Large networks can take several minutes to generate. Be patient. ")
print()
while True:
try:
########################
# Set Layout to Spring #
########################
plt.figure(figsize=(20, 20))
g = nx.from_pandas_edgelist(df, source='origin', target='username')
layout = nx.fruchterman_reingold_layout(g, iterations=500, k=1)
################
# Lonely Nodes #
################
sellerUsernames = list(df.username.unique())
lonelyNodes = [review for review in sellerUsernames if g.degree(review) == 0]
g0 = g.subgraph(lonelyNodes).copy()
nx.draw_networkx_nodes(g, layout, node_color='blue', node_size=4)
nx.draw_networkx_edges(g, layout, width=1, edge_color="lightblue")
###################
# Popular Sellers #
###################
popularSellers = [review for review in sellerUsernames if g.degree(review) > 1]
g1 = g.subgraph(popularSellers).copy()
nx.draw_networkx_nodes(g1, layout, nodelist=popularSellers,
node_size=250, node_color='orange')
nx.draw_networkx_edges(g1, layout, edge_color='orange', width=2)
nodeLabelsPopularSellers = dict(zip(popularSellers, popularSellers))
# nx.draw_networkx_labels(g1, layout, labels=nodeLabelsPopularSellers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
#########################
# Originating Usernames #
#########################
sellerOrigin = list(df.origin.unique())
sellerOriginSize = [g.degree(review) * 10 for review in sellerOrigin]
g2 = g.subgraph(sellerOrigin).copy()
nx.draw_networkx_nodes(g2, layout, nodelist=sellerOrigin,
node_size=sellerOriginSize, node_color='green')
nx.draw_networkx_edges(g2, layout, edge_color='lightgreen', width=2)
nodeLabelsOrigin = dict(zip(sellerOrigin, sellerOrigin))
nx.draw_networkx_labels(g2, layout, labels=nodeLabelsOrigin, font_size=14,
font_color='k', font_family='sans-serif', font_weight='normal')
#####################
# Deleted Usernames #
#####################
usernameList = list(df['username'].unique())
deletedUsers = list([s for s in usernameList if "deleted" in s])
g3 = g.subgraph(deletedUsers).copy()
nx.draw_networkx_nodes(g3, layout, nodelist=deletedUsers, node_size=50, node_color='red')
nx.draw_networkx_edges(g3, layout, edge_color='pink', width=2)
nodeLabelsDeletedUsers = dict(zip(deletedUsers, deletedUsers))
# nx.draw_networkx_labels(g3, layout, labels=nodeLabelsDeletedUsers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
########################
# Display Plot Options #
########################
plotTitle = "eBay Seller Network - Fruchterman Layout"
plt.axis('off')
plt.title(plotTitle)
plt.show()
saveLoc = ("./flea-exports/" + plotTitle)
plt.savefig(saveLoc, transparent=True, dpi=300)
break
except NodeNotFound:
pass
# sourceS = input("Error. Try a different username: ")
print()
print("Warning: Large networks can take several minutes to generate. Be patient. ")
print()
while True:
try:
########################
# Set Layout to Spring #
########################
plt.figure(figsize=(20, 20))
g = nx.from_pandas_edgelist(df, source='origin', target='username')
layout = nx.fruchterman_reingold_layout(g, iterations=1000, k=0.8)
################
# Lonely Nodes #
################
sellerUsernames = list(df.username.unique())
lonelyNodes = [review for review in sellerUsernames if g.degree(review) == 0]
g0 = g.subgraph(lonelyNodes).copy()
nx.draw_networkx_nodes(g, layout, node_color='blue', node_size=4)
nx.draw_networkx_edges(g, layout, width=1, edge_color="lightblue")
###################
# Popular Sellers #
###################
popularSellers = [review for review in sellerUsernames if g.degree(review) > 1]
g1 = g.subgraph(popularSellers).copy()
nx.draw_networkx_nodes(g1, layout, nodelist=popularSellers,
node_size=250, node_color='orange')
nx.draw_networkx_edges(g1, layout, edge_color='orange', width=2)
nodeLabelsPopularSellers = dict(zip(popularSellers, popularSellers))
nx.draw_networkx_labels(g1, layout, labels=nodeLabelsPopularSellers, font_size=14,
font_color='k', font_family='sans-serif', font_weight='normal')
#########################
# Originating Usernames #
#########################
sellerOrigin = list(df.origin.unique())
sellerOriginSize = [g.degree(review) * 10 for review in sellerOrigin]
g2 = g.subgraph(sellerOrigin).copy()
nx.draw_networkx_nodes(g2, layout, nodelist=sellerOrigin,
node_size=sellerOriginSize, node_color='green')
nx.draw_networkx_edges(g2, layout, edge_color='lightgreen', width=2)
nodeLabelsOrigin = dict(zip(sellerOrigin, sellerOrigin))
# nx.draw_networkx_labels(g2, layout, labels=nodeLabelsOrigin, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
#####################
# Deleted Usernames #
#####################
usernameList = list(df['username'].unique())
deletedUsers = list([s for s in usernameList if "deleted" in s])
g3 = g.subgraph(deletedUsers).copy()
nx.draw_networkx_nodes(g3, layout, nodelist=deletedUsers, node_size=50, node_color='red')
nx.draw_networkx_edges(g3, layout, edge_color='pink', width=2)
nodeLabelsDeletedUsers = dict(zip(deletedUsers, deletedUsers))
# nx.draw_networkx_labels(g3, layout, labels=nodeLabelsDeletedUsers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
########################
# Display Plot Options #
########################
plotTitle = "eBay Seller Network - Fruchterman Layout II"
plt.axis('off')
plt.title(plotTitle)
plt.show()
saveLoc = ("./flea-exports/" + plotTitle)
plt.savefig(saveLoc, transparent=True, dpi=300)
break
except NodeNotFound:
pass
# sourceS = input("Error. Try a different username: ")
print()
sourceS = input("Enter a username to center graph: ")
print()
print("Warning: Large networks can take several minutes to generate. Please be patient! ")
while True:
try:
########################
# Set Layout to Spring #
########################
plt.figure(figsize=(20, 20))
g = nx.from_pandas_edgelist(df, source='origin', target='username')
gEgo = nx.ego_graph(g,n=sourceS)
g = nx.from_pandas_edgelist(df, source='origin', target='username', create_using=gEgo)
layout = nx.spring_layout(g,iterations=500, k=0.8)
################
# Lonely Nodes #
################
sellerUsernames = list(df.username.unique())
lonelyNodes = [review for review in sellerUsernames if g.degree(review) == 0]
g0 = g.subgraph(lonelyNodes).copy()
nx.draw_networkx_nodes(g, layout, node_color='blue', node_size=4)
nx.draw_networkx_edges(g, layout, width=1, edge_color="lightblue")
# nodeLabelsLonelyNodes = dict(zip(lonelyNodes, lonelyNodes))
###################
# Popular Sellers #
###################
popularSellers = [review for review in sellerUsernames if g.degree(review) > 1]
g1 = g.subgraph(popularSellers).copy()
nx.draw_networkx_nodes(g1, layout, nodelist=popularSellers,
node_size=250, node_color='orange')
nx.draw_networkx_edges(g1, layout, edge_color='orange', width=2)
nodeLabelsPopularSellers = dict(zip(popularSellers, popularSellers))
# nx.draw_networkx_labels(g1, layout, labels=nodeLabelsPopularSellers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
#########################
# Originating Usernames #
#########################
sellerOrigin = list(df.origin.unique())
sellerOriginSize = [g.degree(review) * 10 for review in sellerOrigin]
g2 = g.subgraph(sellerOrigin).copy()
nx.draw_networkx_nodes(g2, layout, nodelist=sellerOrigin,
node_size=sellerOriginSize, node_color='green')
nx.draw_networkx_edges(g2, layout, edge_color='lightgreen', width=2)
nodeLabelsOrigin = dict(zip(sellerOrigin, sellerOrigin))
nx.draw_networkx_labels(g2, layout, labels=nodeLabelsOrigin, font_size=14,
font_color='k', font_family='sans-serif', font_weight='normal')
#####################
# Deleted Usernames #
#####################
usernameList = list(df['username'].unique())
deletedUsers = list([s for s in usernameList if "deleted" in s])
g3 = g.subgraph(deletedUsers).copy()
nx.draw_networkx_nodes(g3, layout, nodelist=deletedUsers, node_size=50, node_color='red')
nx.draw_networkx_edges(g3, layout, edge_color='pink', width=2)
nodeLabelsDeletedUsers = dict(zip(deletedUsers, deletedUsers))
# nx.draw_networkx_labels(g3, layout, labels=nodeLabelsDeletedUsers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
########################
# Display Plot Options #
########################
plotTitle = "eBay Seller Network - Egocentric Layout"
plt.axis('off')
plt.title(plotTitle)
plt.show()
saveLoc = ("./flea-exports/" + plotTitle)
plt.savefig(saveLoc, transparent=True, dpi=300)
break
except NodeNotFound:
pass
# sourceS = input("Error. Try a different username: ")
while True:
try:
pickle = input("Import Dataframe (Enter to Exit) ")
if pickle == "":
break
else:
pickleName = ("./flea-exports/" + pickle + ".pkl")
df = nx.read_gpickle(pickleName)
print("Import complete.")
print(df.shape)
display(df.head())
break
except IndexError:
print()
print("Error. Try a different file.")
sourceS = input("Source Username: ")
graphnameInput = input("Name your graph to download or view in a new window: ")
g = nx.from_pandas_edgelist(df, source='origin', target='username')
gEgo = nx.ego_graph(g, n=sourceS)
g = nx.from_pandas_edgelist(
df, source='origin', target='username', create_using=gEgo)
g1 = net.Network(notebook=False, height="90%", width="100%",
bgcolor="#222222", font_color="white")
g1.toggle_physics(True)
g1.show_buttons(filter_='physics')
g1.from_nx(g)
sources = list(df.origin.unique())
targets = list(df['username'].unique())
weights = [g.degree(etarget) * 1 for etarget in targets]
edge_data = zip(sources, targets, weights)
graphname = ("./flea-exports/" + graphnameInput + ".html")
g1.write_html(graphname, notebook=False)
cwd = os.getcwd()
user = (cwd.split("-"))
print()
print("WARNING: this is a large dataframe. This may take a few minutes depending how large your network is!")
print()
print("https://tylerseymour.pw/user/" + user[1] + "/view/flea-exports/" + graphnameInput + ".html")
print()
print()
print("Enter the source and target usernames to visualize the shortest path: ")
while True:
try:
#########################
# Set Source and Target #
#########################
print()
sourceS = input("Source Username: ")
targetT = input("Target Username: ")
print()
########################
# Set Layout to Spring #
########################
plt.figure(figsize=(20, 20))
g = nx.from_pandas_edgelist(df, source='origin', target='username')
layout = nx.spring_layout(g,iterations=500, k=0.8)
################
# Lonely Nodes #
################
sellerUsernames = list(df.username.unique())
lonelyNodes = [review for review in sellerUsernames if g.degree(review) == 0]
g0 = g.subgraph(lonelyNodes).copy()
nx.draw_networkx_nodes(g0, layout, node_color='blue', node_size=4)
nx.draw_networkx_edges(g0, layout, width=1, edge_color="lightblue")
nodeLabelsLonelyNodes = dict(zip(lonelyNodes, lonelyNodes))
# nx.draw_networkx_labels(g, layout, labels=nodeLabelsLonelyNodes, font_size=6, font_color='k', font_family='sans-serif', font_weight='normal')
###################
# Popular Sellers #
###################
popularSellers = [review for review in sellerUsernames if g.degree(review) > 1]
g1 = g.subgraph(popularSellers).copy()
nx.draw_networkx_nodes(g1, layout, nodelist=popularSellers,
node_size=250, node_color='orange')
nx.draw_networkx_edges(g1, layout, edge_color='orange', width=2)
nodeLabelsPopularSellers = dict(zip(popularSellers, popularSellers))
nx.draw_networkx_labels(g1, layout, labels=nodeLabelsPopularSellers, font_size=10, font_color='k', font_family='sans-serif', font_weight='normal')
#########################
# Originating Usernames #
#########################
sellerOrigin = list(df.origin.unique())
sellerOriginSize = [g.degree(review) * 10 for review in sellerOrigin]
g2 = g.subgraph(sellerOrigin).copy()
nx.draw_networkx_nodes(g2, layout, nodelist=sellerOrigin,
node_size=sellerOriginSize, node_color='green')
nx.draw_networkx_edges(g2, layout, edge_color='lightgreen', width=2)
nodeLabelsOrigin = dict(zip(sellerOrigin, sellerOrigin))
nx.draw_networkx_labels(g2, layout, labels=nodeLabelsOrigin, font_size=10, font_color='k', font_family='sans-serif', font_weight='normal')
#####################
# Deleted Usernames #
#####################
usernameList = list(df['username'].unique())
deletedUsers = list([s for s in usernameList if "deleted" in s])
g3 = g.subgraph(deletedUsers).copy()
nx.draw_networkx_nodes(g3, layout, nodelist=deletedUsers, node_size=100, node_color='pink')
nx.draw_networkx_edges(g3, layout, edge_color='pink', width=2)
nodeLabelsDeletedUsers = dict(zip(deletedUsers, deletedUsers))
# nx.draw_networkx_labels(g3, layout, labels=nodeLabelsDeletedUsers, font_size=14,
# font_color='k', font_family='sans-serif', font_weight='normal')
#######################################
# Display Shortest Path Between Nodes #
#######################################
path = nx.shortest_path(g, source=sourceS, target=targetT, method='dijkstra')
g9 = g.subgraph(path).copy()
pathLabels = dict(zip(path, path))
nx.draw_networkx_nodes(g9, layout, nodelist=path,
node_size=1500, node_color='blue')
nx.draw_networkx_edges(g9, layout, edge_color='red', width=8)
nx.draw_networkx_labels(g9, layout, labels=pathLabels, font_size=18,
font_color='red', font_family='sans-serif', font_weight='bold')
########################
# Display Plot Options #
########################
plotTitle = (sourceS + " -->" + targetT + "\nMoney Flow Analysis")
plt.axis('off')
plt.title(plotTitle)
plt.show()
saveLoc = ("./flea-exports/" + plotTitle)
plt.savefig(saveLoc, transparent=True, dpi=300)
break
except NodeNotFound:
print("Error. Either the source or target is not in the graph. Try different inputs.")
sourceS = input("New source username: ")
targetT = input("New target username: ")
print()
while True:
try:
pickle = input("Import Dataframe (Enter to Exit): ")
if pickle == "":
break
else:
pickleName = ("./flea-exports/" + pickle + ".pkl")
df = nx.read_gpickle(pickleName)
print()
print("Import complete.")
print(df.shape)
display(df.head())
break
except FileNotFoundError:
print()
print("Error. Try a different file.")
dfList = []
dfs = []
dfCombo = pd.DataFrame()
while True:
try:
dfName = input("Add a DataFrame (Enter to Exit) ")
if dfName == "":
break
else:
dfName = ("./flea-exports/" + dfName + ".pkl")
dfList.append(dfName)
print(dfList)
except:
print("Error, try another name (Enter to Exit) ")
for name in dfList:
dfAdd = nx.read_gpickle(name)
dfs.append(dfAdd)
dfCombo = pd.concat(dfs)
dfCombo.reset_index(inplace=True, drop=True)
comboName = input("Save Combo Frame As:")
comboName = ("./flea-exports/" + comboName + ".pkl")
dfCombo.to_pickle(comboName)
print(dfCombo.shape)
display(dfCombo.head())
print()
while True:
try:
pickle = input("Import Dataframe (Enter to Exit): ")
if pickle == "":
break
else:
pickleName = ("./flea-exports/" + pickle + ".pkl")
df = nx.read_gpickle(pickleName)
print()
print("Import complete.")
print(df.shape)
display(df.head())
break
except FileNotFoundError:
print()
print("Error. Try a different file.")
print()
usernameList = list(df.username.unique())
usernameDict = {}
count = 0
for username in usernameList:
try:
html = ('https://feedback.ebay.co.uk/ws/eBayISAPI.dll?ViewFeedback2&ftab=FeedbackLeftForOthers&userid=' +
username + '&iid=-1&de=off&items=200&searchInterval=30&mPg=2&page=1')
tempdf = pd.read_html(html, header=0)
feedbackCountdf = tempdf[14].copy(deep=False)
fcount = feedbackCountdf.columns.get_values()[0]
flist = fcount.split(' ')
flist[0] = flist[0].replace(',', '')
numReviews = int(flist[0])
pages = (numReviews // 200) + 2
usernameDict[username] = pages
count = count + 1
print("Success\t" + username + " | " + str(numReviews) +
" Reviews" + " | " + str(count) + "/" + str(len(usernameList)))
pass
except (IndexError, ValueError):
count = count + 1
print("Skipped\t" + username + " | " +
str(count) + "/" + str(len(usernameList)))
pass
pass
print()
print("Complete.")
FleaBay is a versatile tool taking several inputs, outputs, and useful utilities.
Input:
- Single Username
- Examples: janedoe
Output:
- 1-Hop Graph Visualization (.png);
- Exported Graph Data (.pkl).
Input:
- Multiple Usernames
Output:
- 2-Hop Graph Visualization (.png);
- Interactive Graph Visualization (.html);
- Exported Graph Data (.pkl).
Input:
- Source and Destination Usernames.
Output:
- Shortest path from Source to Destination Visualization (.png)
Input:
- List of cleaned exportdataframesclea Graphs (.pkl)
Output:
- Combined (Single) Graph Data (.pkl).